Adaptive-sampling algorithms for answering aggregation queries on Web sites

نویسندگان

  • Foto N. Afrati
  • Paraskevas V. Lekeas
  • Chen Li
چکیده

Many Web sites publish their data in a hierarchical structure. For instance, Amazon.com organizes its pages on books as a hierarchy, in which each leaf node corresponds to a collection of pages of books in the same class (e.g., books on Data Mining). Users can easily browse this class by following a path from the root to the corresponding leaf node, such as ‘‘Computers & Internet – Databases – Storage – Data Mining’’. Business applications often require to submit aggregation queries on such data, such as ‘‘finding the average price of books on Data Mining’’. On the other hand, it is computationally expensive to compute the exact answer to such a query due to the large amount of data, its dynamicity, and limited Web-access resources. In this paper, we study how to answer such aggregation queries approximately with quality guarantees using sampling. We study how to use adaptive-sampling techniques that allocate the resources adaptively based on partial samples retrieved from different nodes in the hierarchy. Based on statistical methods, we study how to estimate the quality of the answer using the sample. Our experimental study using real and synthetic data sets validates the proposed techniques. 2007 Elsevier B.V. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Answering Aggregation Queries on Hierarchical Web Sites Using

Many Web sites publish their data in a hierarchical structure. For instance, Amazon.com organizes its pages about books as a hierarchy, in which each leaf node corresponds to a collection of pages of books in the same class (e.g., books on Data Mining). This class can be browsed easily by users following a path from the root to the corresponding leaf node, such as “Computers & Internet — Databa...

متن کامل

Efficient Ad-hoc Approximate Query Processing in Peer-to-Peer Databases

1 This paper has appeared in The 22 International Conference on Data Engineering (ICDE) Atlanta, Georgia 2006. ABSTRACT Peer-to-peer databases are becoming prevalent on the Internet for distribution and sharing of documents, applications, and other digital media. The problem of answering large scale, ad-hoc analysis queries – e.g., aggregation queries – on these databases poses unique challenge...

متن کامل

Overcoming Limitations of Sampling for Aggregation Queries

We study the problem of approximately answering aggregation queries using sampling. We observe that uniform sampling performs poorly when the distribution of the aggregated attribute is skewed. To address this issue, we introduce a technique called outlier-indexing. Uniform sampling is also ineffective for queries with low selectivity. We rely on weighted sampling based on workload information ...

متن کامل

Complexity of Answering Counting Aggregate Queries over DL-Lite

The ontology based data access model assumes that users access data by means of an ontology, which is often described in terms of description logics. As a consequence, languages for managing ontologies now need algorithms not only to decide standard reasoning problems, but also to answer databaselike queries. However, fundamental database aggregate queries, such as the ones using functions COUN...

متن کامل

مدل جدیدی برای جستجوی عبارت بر اساس کمینه جابه‌جایی وزن‌دار

Finding high-quality web pages is one of the most important tasks of search engines. The relevance between the documents found and the query searched depends on the user observation and increases the complexity of ranking algorithms. The other issue is that users often explore just the first 10 to 20 results while millions of pages related to a query may exist. So search engines have to use sui...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Data Knowl. Eng.

دوره 64  شماره 

صفحات  -

تاریخ انتشار 2008